3 research outputs found

    Susceptible workload driven selective fault tolerance using a probabilistic fault model

    No full text
    In this paper, we present a novel fault tolerance design technique, which is applicable at the register transfer level, based on protecting the functionality of logic circuits using a probabilistic fault model. The proposed technique selects the most susceptible workload of combinational circuits to protect against probabilistic faults. The workload susceptibility is ranked as the likelihood of any fault to bypass the inherent logical masking of the circuit and propagate an erroneous response to its outputs, when that workload is executed. The workload protection is achieved through a Triple Modular Redundancy (TMR) scheme by using the patterns that have been evaluated as most susceptible. We apply the proposed technique on LGSynth91 and ISCAS85 benchmarks and evaluate its fault tolerance capabilities against errors induced by permanent faults and soft errors. We show that the proposed technique, when it is applied to protect only the 32 most susceptible patterns, achieves on average of all the examined benchmarks, an error coverage improvement of 98% and 94% against errors induced by single stuck-at faults (permanent faults) and soft errors (transient faults), respectively, compared to a reduced TMR scheme that protects the same number of susceptible patterns without ranking them

    Fault tolerance & error monitoring techniques for cost constrained systems

    No full text
    With technology scaling, the reliability of circuits is becoming a growing concern. The appearance of logic errors in-the-field caused by faults escaping manufacturing testing, single-event upsets, aging, or process variations is increasing. Traditional techniques for online testing and circuit protection often require a high design effort or result in high area overhead and power consumption and are unsuitable for low cost systems.This thesis presents three original contributions in the form of low cost techniques for online error detection and protection in cost constrained systems. The first contribution consists on low cost fault tolerance design technique that protects the most susceptible workload on the most susceptible logic cones of a circuit, by targeting both timing independent and timing-dependent errors. The susceptible workload is protected by a partial Triple Modular Redundancy (TMR) scheme. Protecting the 32 most susceptible patterns, an average error coverage improvement of 63.5% and 58.2% against errors induced by stuck-at and transition faults is achieved, respectively, compared an unranked pattern selection and protection. Additionally, this technique produces an average error coverage improvement of 163% and 96% against temporary erroneous output transition and errors induced by bit-flips, respectively. These error coverage improvements incur in an area/power cost in the range of 18.0-54.2%, a 145.8-182.0% reduction compared to TMR. The second contribution proposes a low cost probabilistic online error monitoring technique that produces an alarm signal when systematic erroneous behaviour has occurred over a pre-defined time interval. To detect systematic erroneous behaviour, the collected data is compared on-chip against the signature of error-free behaviour. Results demonstrate on the largest circuits, an average error coverage of 84.4% and 73.1% of errors induced by bit-flips and stuck-at faults, respectively, with an average area cost of 1.66%. The final contribution consists of a circuit approximation technique that can be used for low cost non-intrusive fault tolerance and concurrent error detection, based on finding functionality at the logic level that behaves similarly to single logic gates or constant values. An algorithm is proposed to select the input subsets to approximate.Results show an average coverage of 33.59% of all the input space with an average 7.43% area cost. Using this approximate circuits in a reduced TMR scheme results in significant area cost reductions compared to existing techniques

    Low cost error monitoring for improved maintainability of IoT applications

    No full text
    Electronic systems with power-constrained embedded devices are used for a variety of IoT applications, such as geomonitoring, parking sensors and surveillance. Such applications may tolerate few errors. However, with the increasing occurrence of faults in-the-field, devices that exhibit systematic erroneous behaviour must be eventually identified and replaced. In this paper, we propose a novel low cost error monitoring technique to assist the maintainability planning of low power IoT applications by ranking devices based on the systematic erroneous behaviour they exhibit. Small on-chip monitors are used to collect the signal probability information at the outputs of each device which is then transmitted to the system software via the communications channel of the system to rank them accordingly. To evaluate the error monitoring capabilities of the proposed technique, we injected multiple bit-flips and stuck-at faults on a set of the EPFL and the ISCAS benchmarks. Results demonstrate an average error coverage of 84.4% and 73.1% of errors induced by bit-flips and stuck-at faults, respectively, with an average area cost of 1.52%. A maintainability planning simulation shows that the proposed technique achieves a reduction of 26x to 263x in area cost and static power, and consumes over 625x less power for communications when compared against duplication and comparison
    corecore